{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zkufh760uvF3"
      },
      "source": [
        "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n",
        "\n",
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_negation_classifier_demo_biological_texts.ipynb)\n",
        "\n",
        "\n",
        "# Training a Sentiment Analysis Classifier with NLU\n",
        "## 2 Class Biological Negation Classifer Training\n",
        "With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator)  from Spark NLP you can achieve State Of the Art results on any multi class text classification problem\n",
        "\n",
        "This notebook showcases the following features :\n",
        "\n",
        "- How to train the deep learning classifier\n",
        "- How to store a pipeline to disk\n",
        "- How to load the pipeline from disk (Enables NLU offline mode)\n",
        "\n",
        "You can achieve these results or even better on this dataset with training  data  :\n",
        "\n",
        "<br>\n",
        "\n",
        "![image.png]()\n",
        "\n",
        "\n",
        "You can achieve these results or even better on this dataset with test  data  :\n",
        "\n",
        "<br>\n",
        "\n",
        "\n",
        "![Screenshot 2021-02-25 140123.png]()\n",
        "\n",
        "\n",
        "\n",
        "\n",
        "\n",
        "\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dur2drhW5Rvi"
      },
      "source": [
        "# 1. Colab Setup"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "hFGnBCHavltY"
      },
      "source": [
        "!pip install -q johnsnowlabs"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f4KkTfnR5Ugg"
      },
      "source": [
        "# 2. Download   Negation Bilogical Texts dataset\n",
        "https://www.kaggle.com/ma7555/bioscope-corpus-negation-annotated\n",
        "#Context\n",
        "The BioScope corpus consists of medical and biological texts annotated for negation and their linguistic scope. This was done to allow a comparison between the development of systems for negation/hedge detection and scope resolution.\n",
        "The corpus is publicly available for research purposes.\n",
        "\n",
        "You can use this corpus to fine-tune a BERT-like model for negation detection.\n",
        "\n",
        "This dataset was created in this format during the COVID-19 crisis as a training set for detecting negations regarding treatment of specific drugs in the released research papers.\n",
        "\n",
        "Creators of the original dataset: MTA-SZTE Research Group on Artificial Intelligence - RGAI\n",
        "https://rgai.inf.u-szeged.hu/node/105\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "OrVb5ZMvvrQD"
      },
      "source": [
        "! wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/classifier-dl/bioscope_abstract/bioscope_abstract.csv\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 424
        },
        "id": "y4xSRWIhwT28",
        "outputId": "f8f31c71-7046-474a-fd36-fb21ac09e091"
      },
      "source": [
        "import pandas as pd\n",
        "train_path = '/content/bioscope_abstract.csv'\n",
        "\n",
        "train_df = pd.read_csv(train_path)\n",
        "# the text data to use for classification should be in a column named 'text' and label column should be named 'y' or 'label' or 'labels'\n",
        "columns=['text','y']\n",
        "train_df = train_df[columns]\n",
        "train_df = train_df.dropna()\n",
        "train_df = train_df.sample(frac=1).reset_index(drop=True)\n",
        "from sklearn.model_selection import train_test_split\n",
        "\n",
        "train_df, test_df = train_test_split(train_df, test_size=0.2)\n",
        "train_df"
      ],
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                                                    text         y\n",
              "7142   These apparent inconsistencies reflect the com...  positive\n",
              "2305   Different glucocorticoid hormones (GCH) show d...  positive\n",
              "1203   Northern blot analysis of RNA purified from B ...  positive\n",
              "2093   These results indicate that E3 is a hematopoie...  positive\n",
              "7594   During recent years, studies of insulin-gene r...  positive\n",
              "...                                                  ...       ...\n",
              "3894   It can also be distinguished from other previo...  positive\n",
              "2990   We tested the effects of BHA, a phenolic, lipi...  positive\n",
              "11027  Sequence analyses of pCD41 indicate that there...  positive\n",
              "9537   Over a 72-hr period of activation, the express...  positive\n",
              "7699   The B cell NFAT complex, however, was not func...  negative\n",
              "\n",
              "[9594 rows x 2 columns]"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-7bd7ea7b-8391-416f-a2fa-a6875f1c512d\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>text</th>\n",
              "      <th>y</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>7142</th>\n",
              "      <td>These apparent inconsistencies reflect the com...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2305</th>\n",
              "      <td>Different glucocorticoid hormones (GCH) show d...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1203</th>\n",
              "      <td>Northern blot analysis of RNA purified from B ...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2093</th>\n",
              "      <td>These results indicate that E3 is a hematopoie...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7594</th>\n",
              "      <td>During recent years, studies of insulin-gene r...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3894</th>\n",
              "      <td>It can also be distinguished from other previo...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2990</th>\n",
              "      <td>We tested the effects of BHA, a phenolic, lipi...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11027</th>\n",
              "      <td>Sequence analyses of pCD41 indicate that there...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>9537</th>\n",
              "      <td>Over a 72-hr period of activation, the express...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7699</th>\n",
              "      <td>The B cell NFAT complex, however, was not func...</td>\n",
              "      <td>negative</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>9594 rows × 2 columns</p>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-7bd7ea7b-8391-416f-a2fa-a6875f1c512d')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-7bd7ea7b-8391-416f-a2fa-a6875f1c512d button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-7bd7ea7b-8391-416f-a2fa-a6875f1c512d');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-d75b7929-aff0-4eba-8482-51570d63f102\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-d75b7929-aff0-4eba-8482-51570d63f102')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-d75b7929-aff0-4eba-8482-51570d63f102 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 3
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0296Om2C5anY"
      },
      "source": [
        "# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n",
        "\n",
        "You dataset label column should be named 'y' and the feature column with text data should be named 'text'"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "3ZIPkRkWftBG",
        "outputId": "bdfad334-78a0-49f2-9271-1a727ddaa17a"
      },
      "source": [
        "from johnsnowlabs import nlp\n",
        "from sklearn.metrics import classification_report\n",
        "\n",
        "# load a trainable pipeline by specifying the train. prefix  and fit it on a datset with label and text columns\n",
        "# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n",
        "trainable_pipe = nlp.load('train.sentiment')\n",
        "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n",
        "\n",
        "# predict with the trainable pipeline on dataset and get predictions\n",
        "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n",
        "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n",
        "preds.dropna(inplace=True)\n",
        "print(classification_report(preds['y'], preds['sentiment']))\n",
        "\n",
        "preds"
      ],
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Warning::Spark Session already created, some configs may not take.\n",
            "sent_small_bert_L2_128 download started this may take some time.\n",
            "Approximate size to download 16.1 MB\n",
            "[OK!]\n",
            "              precision    recall  f1-score   support\n",
            "\n",
            "    negative       0.00      0.00      0.00         6\n",
            "    positive       0.88      1.00      0.94        44\n",
            "\n",
            "    accuracy                           0.88        50\n",
            "   macro avg       0.44      0.50      0.47        50\n",
            "weighted avg       0.77      0.88      0.82        50\n",
            "\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                                             document  \\\n",
              "0   These apparent inconsistencies reflect the com...   \n",
              "1   Different glucocorticoid hormones (GCH) show d...   \n",
              "2   Northern blot analysis of RNA purified from B ...   \n",
              "3   These results indicate that E3 is a hematopoie...   \n",
              "4   During recent years, studies of insulin-gene r...   \n",
              "5   To our knowledge, this constitutes the first i...   \n",
              "6   In contrast, gp41 failed to stimulate NF-kappa...   \n",
              "7   Data obtained from studies in our laboratories...   \n",
              "8   RESULTS: Interleukin-6 protein and mRNA produc...   \n",
              "9   When submitted to an in vitro CD4 cross-linkin...   \n",
              "10  Treatment of human resting T cells with phorbo...   \n",
              "11  Distinct DNase-I hypersensitive sites are asso...   \n",
              "12  The biological activity of 11 alpha-methyl-1 a...   \n",
              "13  These observations indicate that the monovalen...   \n",
              "14  The M-CSF receptor was first detectable in the...   \n",
              "15  TNFRI has been recently shown to activate NF-k...   \n",
              "16  The two mutations R217A and R294A caused an in...   \n",
              "17  Cipro did not affect the nuclear transcription...   \n",
              "18  Conversely, diurnal rhythmicity persisted in a...   \n",
              "19  The immunoglobulin heavy chain (IgH) class swi...   \n",
              "20  Four regions in this DNA fragment interact wit...   \n",
              "21  Previous characterization of the GPIX promoter...   \n",
              "22  Neutrophil maturation was impaired in PEBP2bet...   \n",
              "23  Treatment of normal monocytes with 12-0-tetrad...   \n",
              "24  We have cloned the gene for a new ets-related ...   \n",
              "25  Expression and genomic configuration of GM-CSF...   \n",
              "26  Consequently, cytosolic activation, nuclear tr...   \n",
              "27  We propose that Jun plays a bifunctional role ...   \n",
              "28  In B lymphoid cells, deltaspi-B and spi-B mRNA...   \n",
              "29  In order to study CD14 gene regulation, the hu...   \n",
              "30  E3 transcripts were RA-inducible in HL60 cells...   \n",
              "31  IL-7 also delayed the decreases in the levels ...   \n",
              "32  A marked activation of gamma-promoter activity...   \n",
              "33  These results demonstrate that the transcripti...   \n",
              "34  The A6H monoclonal antibody (mAb) recognizes a...   \n",
              "35  The highly specific granulomonocyte-associated...   \n",
              "36  Compared to benign tumor or mammary reduction ...   \n",
              "37  Sequence-specific DNA-binding small molecules ...   \n",
              "38  Structure function analysis of vitamin D analo...   \n",
              "39  To determine the mechanisms responsible for th...   \n",
              "40  DNA band-shift analysis reveals NF-kappa B bin...   \n",
              "41  RESULTS: Both all-trans and 9-cis RA inhibited...   \n",
              "42  Negative selection in the cortex appears to be...   \n",
              "43  Nitric oxide-stimulated guanine nucleotide exc...   \n",
              "44  We show that the ability to detect NF-ATp in a...   \n",
              "45  C3/5 cells developed specific proliferation an...   \n",
              "46  Recent reports demonstrated that ionizing radi...   \n",
              "47  Here we examine molecular mechanisms controlli...   \n",
              "48  These results demonstrate that multiple X1 box...   \n",
              "49  IL-2 and IL-7 were equivalent in their ability...   \n",
              "\n",
              "                 sentence_embedding_small_bert_L2_128 sentiment  \\\n",
              "0   [-0.540424644947052, 0.6268006563186646, -0.67...  positive   \n",
              "1   [-0.7596609592437744, -0.5880932807922363, -0....  positive   \n",
              "2   [-0.5203543305397034, 0.033645179122686386, -0...  positive   \n",
              "3   [-0.7046567797660828, 0.33753958344459534, -0....  positive   \n",
              "4   [-0.7543756365776062, 0.4511456787586212, -0.9...  positive   \n",
              "5   [-1.0517436265945435, -0.0811738669872284, -0....  positive   \n",
              "6   [-0.8322800397872925, 0.5296803116798401, -0.5...  positive   \n",
              "7   [-0.7395010590553284, 0.5016824007034302, -0.5...  positive   \n",
              "8   [-0.06874295324087143, 0.3614305853843689, -0....  positive   \n",
              "9   [-0.24975398182868958, 0.007427605800330639, -...  positive   \n",
              "10  [-0.7139838933944702, -0.15548910200595856, -0...  positive   \n",
              "11  [0.16768714785575867, -0.4549502730369568, -0....  positive   \n",
              "12  [-0.6323443055152893, -0.3998635411262512, -0....  positive   \n",
              "13  [-0.5919932723045349, 0.1734667867422104, -0.8...  positive   \n",
              "14  [-0.526080310344696, 0.1735551655292511, -0.53...  positive   \n",
              "15  [-0.7427854537963867, -0.3252595067024231, -0....  positive   \n",
              "16  [-0.5198838710784912, 0.6203998327255249, -0.6...  positive   \n",
              "17  [-0.010745533742010593, 0.36891499161720276, -...  positive   \n",
              "18  [-0.6558191180229187, -0.4102998673915863, -0....  positive   \n",
              "19  [-0.5788812637329102, 0.16702984273433685, -0....  positive   \n",
              "20  [0.4348486065864563, -0.4957105219364166, -0.9...  positive   \n",
              "21  [-1.4378761053085327, 0.35956352949142456, -0....  positive   \n",
              "22  [-0.9040020704269409, -0.3321942389011383, -0....  positive   \n",
              "23  [-0.38281142711639404, -0.09264473617076874, -...  positive   \n",
              "24  [-0.06995158642530441, 0.0628473237156868, -0....  positive   \n",
              "25  [-0.2674736976623535, 0.6871048212051392, -0.5...  positive   \n",
              "26  [-0.11669447273015976, -0.41803038120269775, -...  positive   \n",
              "27  [-0.3439536988735199, -0.11270888149738312, -0...  positive   \n",
              "28  [-0.026912417262792587, -0.07306383550167084, ...  positive   \n",
              "29  [-0.6894456744194031, 0.6227385997772217, -0.4...  positive   \n",
              "30  [0.055561263114213943, 0.29671669006347656, -0...  positive   \n",
              "31  [-1.2081276178359985, -0.4925746023654938, -0....  positive   \n",
              "32  [-0.7283461093902588, -0.00887372437864542, -0...  positive   \n",
              "33  [-0.8078993558883667, 0.024193232879042625, -0...  positive   \n",
              "34  [-0.6170817017555237, 0.176478773355484, -0.77...  positive   \n",
              "35  [-0.5863608717918396, 0.10614630579948425, -0....  positive   \n",
              "36  [0.0020447946153581142, 0.1792808622121811, -0...  positive   \n",
              "37  [-0.2069147676229477, 0.016566185280680656, -0...  positive   \n",
              "38  [-0.4320926070213318, 0.74772709608078, -0.559...  positive   \n",
              "39  [-0.37579116225242615, 0.5168997049331665, -0....  positive   \n",
              "40  [-0.04819030314683914, 0.04168706387281418, -0...  positive   \n",
              "41  [-0.5226595401763916, -0.33986032009124756, -0...  positive   \n",
              "42  [-0.022223301231861115, -0.22735534608364105, ...  positive   \n",
              "43  [-0.3036770224571228, -0.13188859820365906, -0...  positive   \n",
              "44  [-0.6237607598304749, 0.3612492084503174, -0.4...  positive   \n",
              "45  [-0.26343587040901184, -0.09341304749250412, -...  positive   \n",
              "46  [-0.5906685590744019, 0.20001085102558136, -0....  positive   \n",
              "47  [-0.4508644938468933, 0.2742282450199127, -0.0...  positive   \n",
              "48  [-0.3715269863605499, 0.3266661465167999, -0.3...  positive   \n",
              "49  [-0.7310452461242676, -0.2004045844078064, -0....  positive   \n",
              "\n",
              "   sentiment_confidence                                               text  \\\n",
              "0                   1.0  These apparent inconsistencies reflect the com...   \n",
              "1                   3.0  Different glucocorticoid hormones (GCH) show d...   \n",
              "2                   2.0  Northern blot analysis of RNA purified from B ...   \n",
              "3                   4.0  These results indicate that E3 is a hematopoie...   \n",
              "4                   4.0  During recent years, studies of insulin-gene r...   \n",
              "5                   3.0  To our knowledge, this constitutes the first i...   \n",
              "6                   2.0  In contrast, gp41 failed to stimulate NF-kappa...   \n",
              "7                   1.0  Data obtained from studies in our laboratories...   \n",
              "8                   8.0  RESULTS: Interleukin-6 protein and mRNA produc...   \n",
              "9                   2.0  When submitted to an in vitro CD4 cross-linkin...   \n",
              "10                  5.0  Treatment of human resting T cells with phorbo...   \n",
              "11                  1.0  Distinct DNase-I hypersensitive sites are asso...   \n",
              "12                  4.0  The biological activity of 11 alpha-methyl-1 a...   \n",
              "13                  3.0  These observations indicate that the monovalen...   \n",
              "14                  9.0  The M-CSF receptor was first detectable in the...   \n",
              "15                  9.0  TNFRI has been recently shown to activate NF-k...   \n",
              "16                  3.0  The two mutations R217A and R294A caused an in...   \n",
              "17                  8.0  Cipro did not affect the nuclear transcription...   \n",
              "18                  8.0  Conversely, diurnal rhythmicity persisted in a...   \n",
              "19                  2.0  The immunoglobulin heavy chain (IgH) class swi...   \n",
              "20                  1.0  Four regions in this DNA fragment interact wit...   \n",
              "21                  4.0  Previous characterization of the GPIX promoter...   \n",
              "22                  8.0  Neutrophil maturation was impaired in PEBP2bet...   \n",
              "23                  1.0  Treatment of normal monocytes with 12-0-tetrad...   \n",
              "24                  5.0  We have cloned the gene for a new ets-related ...   \n",
              "25                  4.0  Expression and genomic configuration of GM-CSF...   \n",
              "26                  5.0  Consequently, cytosolic activation, nuclear tr...   \n",
              "27                  3.0  We propose that Jun plays a bifunctional role ...   \n",
              "28                  1.0  In B lymphoid cells, deltaspi-B and spi-B mRNA...   \n",
              "29                  8.0  In order to study CD14 gene regulation, the hu...   \n",
              "30                  9.0  E3 transcripts were RA-inducible in HL60 cells...   \n",
              "31                  3.0  IL-7 also delayed the decreases in the levels ...   \n",
              "32                  1.0  A marked activation of gamma-promoter activity...   \n",
              "33                  6.0  These results demonstrate that the transcripti...   \n",
              "34                  2.0  The A6H monoclonal antibody (mAb) recognizes a...   \n",
              "35                  1.0  The highly specific granulomonocyte-associated...   \n",
              "36                  1.0  Compared to benign tumor or mammary reduction ...   \n",
              "37                  1.0  Sequence-specific DNA-binding small molecules ...   \n",
              "38                  1.0  Structure function analysis of vitamin D analo...   \n",
              "39                  5.0  To determine the mechanisms responsible for th...   \n",
              "40                  7.0  DNA band-shift analysis reveals NF-kappa B bin...   \n",
              "41                  5.0  RESULTS: Both all-trans and 9-cis RA inhibited...   \n",
              "42                  5.0  Negative selection in the cortex appears to be...   \n",
              "43                  7.0  Nitric oxide-stimulated guanine nucleotide exc...   \n",
              "44                  2.0  We show that the ability to detect NF-ATp in a...   \n",
              "45                  1.0  C3/5 cells developed specific proliferation an...   \n",
              "46                  4.0  Recent reports demonstrated that ionizing radi...   \n",
              "47                  2.0  Here we examine molecular mechanisms controlli...   \n",
              "48                  2.0  These results demonstrate that multiple X1 box...   \n",
              "49                  3.0  IL-2 and IL-7 were equivalent in their ability...   \n",
              "\n",
              "           y  \n",
              "0   positive  \n",
              "1   positive  \n",
              "2   positive  \n",
              "3   positive  \n",
              "4   positive  \n",
              "5   positive  \n",
              "6   negative  \n",
              "7   positive  \n",
              "8   positive  \n",
              "9   positive  \n",
              "10  positive  \n",
              "11  positive  \n",
              "12  positive  \n",
              "13  positive  \n",
              "14  positive  \n",
              "15  positive  \n",
              "16  negative  \n",
              "17  negative  \n",
              "18  positive  \n",
              "19  positive  \n",
              "20  positive  \n",
              "21  positive  \n",
              "22  positive  \n",
              "23  positive  \n",
              "24  positive  \n",
              "25  positive  \n",
              "26  positive  \n",
              "27  positive  \n",
              "28  positive  \n",
              "29  positive  \n",
              "30  negative  \n",
              "31  positive  \n",
              "32  positive  \n",
              "33  positive  \n",
              "34  positive  \n",
              "35  positive  \n",
              "36  positive  \n",
              "37  positive  \n",
              "38  positive  \n",
              "39  positive  \n",
              "40  positive  \n",
              "41  negative  \n",
              "42  negative  \n",
              "43  positive  \n",
              "44  positive  \n",
              "45  positive  \n",
              "46  positive  \n",
              "47  positive  \n",
              "48  positive  \n",
              "49  positive  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-ad4e5d09-24af-4a8c-bf04-61c7f72916ff\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>document</th>\n",
              "      <th>sentence_embedding_small_bert_L2_128</th>\n",
              "      <th>sentiment</th>\n",
              "      <th>sentiment_confidence</th>\n",
              "      <th>text</th>\n",
              "      <th>y</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>These apparent inconsistencies reflect the com...</td>\n",
              "      <td>[-0.540424644947052, 0.6268006563186646, -0.67...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>These apparent inconsistencies reflect the com...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>Different glucocorticoid hormones (GCH) show d...</td>\n",
              "      <td>[-0.7596609592437744, -0.5880932807922363, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>3.0</td>\n",
              "      <td>Different glucocorticoid hormones (GCH) show d...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>Northern blot analysis of RNA purified from B ...</td>\n",
              "      <td>[-0.5203543305397034, 0.033645179122686386, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>2.0</td>\n",
              "      <td>Northern blot analysis of RNA purified from B ...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>These results indicate that E3 is a hematopoie...</td>\n",
              "      <td>[-0.7046567797660828, 0.33753958344459534, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>4.0</td>\n",
              "      <td>These results indicate that E3 is a hematopoie...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>During recent years, studies of insulin-gene r...</td>\n",
              "      <td>[-0.7543756365776062, 0.4511456787586212, -0.9...</td>\n",
              "      <td>positive</td>\n",
              "      <td>4.0</td>\n",
              "      <td>During recent years, studies of insulin-gene r...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>To our knowledge, this constitutes the first i...</td>\n",
              "      <td>[-1.0517436265945435, -0.0811738669872284, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>3.0</td>\n",
              "      <td>To our knowledge, this constitutes the first i...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>In contrast, gp41 failed to stimulate NF-kappa...</td>\n",
              "      <td>[-0.8322800397872925, 0.5296803116798401, -0.5...</td>\n",
              "      <td>positive</td>\n",
              "      <td>2.0</td>\n",
              "      <td>In contrast, gp41 failed to stimulate NF-kappa...</td>\n",
              "      <td>negative</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>Data obtained from studies in our laboratories...</td>\n",
              "      <td>[-0.7395010590553284, 0.5016824007034302, -0.5...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Data obtained from studies in our laboratories...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>RESULTS: Interleukin-6 protein and mRNA produc...</td>\n",
              "      <td>[-0.06874295324087143, 0.3614305853843689, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>8.0</td>\n",
              "      <td>RESULTS: Interleukin-6 protein and mRNA produc...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>When submitted to an in vitro CD4 cross-linkin...</td>\n",
              "      <td>[-0.24975398182868958, 0.007427605800330639, -...</td>\n",
              "      <td>positive</td>\n",
              "      <td>2.0</td>\n",
              "      <td>When submitted to an in vitro CD4 cross-linkin...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>10</th>\n",
              "      <td>Treatment of human resting T cells with phorbo...</td>\n",
              "      <td>[-0.7139838933944702, -0.15548910200595856, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>5.0</td>\n",
              "      <td>Treatment of human resting T cells with phorbo...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11</th>\n",
              "      <td>Distinct DNase-I hypersensitive sites are asso...</td>\n",
              "      <td>[0.16768714785575867, -0.4549502730369568, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Distinct DNase-I hypersensitive sites are asso...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>12</th>\n",
              "      <td>The biological activity of 11 alpha-methyl-1 a...</td>\n",
              "      <td>[-0.6323443055152893, -0.3998635411262512, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>4.0</td>\n",
              "      <td>The biological activity of 11 alpha-methyl-1 a...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>13</th>\n",
              "      <td>These observations indicate that the monovalen...</td>\n",
              "      <td>[-0.5919932723045349, 0.1734667867422104, -0.8...</td>\n",
              "      <td>positive</td>\n",
              "      <td>3.0</td>\n",
              "      <td>These observations indicate that the monovalen...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>14</th>\n",
              "      <td>The M-CSF receptor was first detectable in the...</td>\n",
              "      <td>[-0.526080310344696, 0.1735551655292511, -0.53...</td>\n",
              "      <td>positive</td>\n",
              "      <td>9.0</td>\n",
              "      <td>The M-CSF receptor was first detectable in the...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>15</th>\n",
              "      <td>TNFRI has been recently shown to activate NF-k...</td>\n",
              "      <td>[-0.7427854537963867, -0.3252595067024231, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>9.0</td>\n",
              "      <td>TNFRI has been recently shown to activate NF-k...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>16</th>\n",
              "      <td>The two mutations R217A and R294A caused an in...</td>\n",
              "      <td>[-0.5198838710784912, 0.6203998327255249, -0.6...</td>\n",
              "      <td>positive</td>\n",
              "      <td>3.0</td>\n",
              "      <td>The two mutations R217A and R294A caused an in...</td>\n",
              "      <td>negative</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17</th>\n",
              "      <td>Cipro did not affect the nuclear transcription...</td>\n",
              "      <td>[-0.010745533742010593, 0.36891499161720276, -...</td>\n",
              "      <td>positive</td>\n",
              "      <td>8.0</td>\n",
              "      <td>Cipro did not affect the nuclear transcription...</td>\n",
              "      <td>negative</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>18</th>\n",
              "      <td>Conversely, diurnal rhythmicity persisted in a...</td>\n",
              "      <td>[-0.6558191180229187, -0.4102998673915863, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>8.0</td>\n",
              "      <td>Conversely, diurnal rhythmicity persisted in a...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>19</th>\n",
              "      <td>The immunoglobulin heavy chain (IgH) class swi...</td>\n",
              "      <td>[-0.5788812637329102, 0.16702984273433685, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>2.0</td>\n",
              "      <td>The immunoglobulin heavy chain (IgH) class swi...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>20</th>\n",
              "      <td>Four regions in this DNA fragment interact wit...</td>\n",
              "      <td>[0.4348486065864563, -0.4957105219364166, -0.9...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Four regions in this DNA fragment interact wit...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>21</th>\n",
              "      <td>Previous characterization of the GPIX promoter...</td>\n",
              "      <td>[-1.4378761053085327, 0.35956352949142456, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>4.0</td>\n",
              "      <td>Previous characterization of the GPIX promoter...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>22</th>\n",
              "      <td>Neutrophil maturation was impaired in PEBP2bet...</td>\n",
              "      <td>[-0.9040020704269409, -0.3321942389011383, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>8.0</td>\n",
              "      <td>Neutrophil maturation was impaired in PEBP2bet...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>23</th>\n",
              "      <td>Treatment of normal monocytes with 12-0-tetrad...</td>\n",
              "      <td>[-0.38281142711639404, -0.09264473617076874, -...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Treatment of normal monocytes with 12-0-tetrad...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>24</th>\n",
              "      <td>We have cloned the gene for a new ets-related ...</td>\n",
              "      <td>[-0.06995158642530441, 0.0628473237156868, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>5.0</td>\n",
              "      <td>We have cloned the gene for a new ets-related ...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>25</th>\n",
              "      <td>Expression and genomic configuration of GM-CSF...</td>\n",
              "      <td>[-0.2674736976623535, 0.6871048212051392, -0.5...</td>\n",
              "      <td>positive</td>\n",
              "      <td>4.0</td>\n",
              "      <td>Expression and genomic configuration of GM-CSF...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>26</th>\n",
              "      <td>Consequently, cytosolic activation, nuclear tr...</td>\n",
              "      <td>[-0.11669447273015976, -0.41803038120269775, -...</td>\n",
              "      <td>positive</td>\n",
              "      <td>5.0</td>\n",
              "      <td>Consequently, cytosolic activation, nuclear tr...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>27</th>\n",
              "      <td>We propose that Jun plays a bifunctional role ...</td>\n",
              "      <td>[-0.3439536988735199, -0.11270888149738312, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>3.0</td>\n",
              "      <td>We propose that Jun plays a bifunctional role ...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>28</th>\n",
              "      <td>In B lymphoid cells, deltaspi-B and spi-B mRNA...</td>\n",
              "      <td>[-0.026912417262792587, -0.07306383550167084, ...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>In B lymphoid cells, deltaspi-B and spi-B mRNA...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>29</th>\n",
              "      <td>In order to study CD14 gene regulation, the hu...</td>\n",
              "      <td>[-0.6894456744194031, 0.6227385997772217, -0.4...</td>\n",
              "      <td>positive</td>\n",
              "      <td>8.0</td>\n",
              "      <td>In order to study CD14 gene regulation, the hu...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>30</th>\n",
              "      <td>E3 transcripts were RA-inducible in HL60 cells...</td>\n",
              "      <td>[0.055561263114213943, 0.29671669006347656, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>9.0</td>\n",
              "      <td>E3 transcripts were RA-inducible in HL60 cells...</td>\n",
              "      <td>negative</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>31</th>\n",
              "      <td>IL-7 also delayed the decreases in the levels ...</td>\n",
              "      <td>[-1.2081276178359985, -0.4925746023654938, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>3.0</td>\n",
              "      <td>IL-7 also delayed the decreases in the levels ...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>32</th>\n",
              "      <td>A marked activation of gamma-promoter activity...</td>\n",
              "      <td>[-0.7283461093902588, -0.00887372437864542, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>A marked activation of gamma-promoter activity...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>33</th>\n",
              "      <td>These results demonstrate that the transcripti...</td>\n",
              "      <td>[-0.8078993558883667, 0.024193232879042625, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>6.0</td>\n",
              "      <td>These results demonstrate that the transcripti...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>34</th>\n",
              "      <td>The A6H monoclonal antibody (mAb) recognizes a...</td>\n",
              "      <td>[-0.6170817017555237, 0.176478773355484, -0.77...</td>\n",
              "      <td>positive</td>\n",
              "      <td>2.0</td>\n",
              "      <td>The A6H monoclonal antibody (mAb) recognizes a...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>35</th>\n",
              "      <td>The highly specific granulomonocyte-associated...</td>\n",
              "      <td>[-0.5863608717918396, 0.10614630579948425, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>The highly specific granulomonocyte-associated...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>36</th>\n",
              "      <td>Compared to benign tumor or mammary reduction ...</td>\n",
              "      <td>[0.0020447946153581142, 0.1792808622121811, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Compared to benign tumor or mammary reduction ...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>37</th>\n",
              "      <td>Sequence-specific DNA-binding small molecules ...</td>\n",
              "      <td>[-0.2069147676229477, 0.016566185280680656, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Sequence-specific DNA-binding small molecules ...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>38</th>\n",
              "      <td>Structure function analysis of vitamin D analo...</td>\n",
              "      <td>[-0.4320926070213318, 0.74772709608078, -0.559...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Structure function analysis of vitamin D analo...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>39</th>\n",
              "      <td>To determine the mechanisms responsible for th...</td>\n",
              "      <td>[-0.37579116225242615, 0.5168997049331665, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>5.0</td>\n",
              "      <td>To determine the mechanisms responsible for th...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>40</th>\n",
              "      <td>DNA band-shift analysis reveals NF-kappa B bin...</td>\n",
              "      <td>[-0.04819030314683914, 0.04168706387281418, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>7.0</td>\n",
              "      <td>DNA band-shift analysis reveals NF-kappa B bin...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>41</th>\n",
              "      <td>RESULTS: Both all-trans and 9-cis RA inhibited...</td>\n",
              "      <td>[-0.5226595401763916, -0.33986032009124756, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>5.0</td>\n",
              "      <td>RESULTS: Both all-trans and 9-cis RA inhibited...</td>\n",
              "      <td>negative</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>42</th>\n",
              "      <td>Negative selection in the cortex appears to be...</td>\n",
              "      <td>[-0.022223301231861115, -0.22735534608364105, ...</td>\n",
              "      <td>positive</td>\n",
              "      <td>5.0</td>\n",
              "      <td>Negative selection in the cortex appears to be...</td>\n",
              "      <td>negative</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>43</th>\n",
              "      <td>Nitric oxide-stimulated guanine nucleotide exc...</td>\n",
              "      <td>[-0.3036770224571228, -0.13188859820365906, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>7.0</td>\n",
              "      <td>Nitric oxide-stimulated guanine nucleotide exc...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>44</th>\n",
              "      <td>We show that the ability to detect NF-ATp in a...</td>\n",
              "      <td>[-0.6237607598304749, 0.3612492084503174, -0.4...</td>\n",
              "      <td>positive</td>\n",
              "      <td>2.0</td>\n",
              "      <td>We show that the ability to detect NF-ATp in a...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>45</th>\n",
              "      <td>C3/5 cells developed specific proliferation an...</td>\n",
              "      <td>[-0.26343587040901184, -0.09341304749250412, -...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>C3/5 cells developed specific proliferation an...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>46</th>\n",
              "      <td>Recent reports demonstrated that ionizing radi...</td>\n",
              "      <td>[-0.5906685590744019, 0.20001085102558136, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>4.0</td>\n",
              "      <td>Recent reports demonstrated that ionizing radi...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>47</th>\n",
              "      <td>Here we examine molecular mechanisms controlli...</td>\n",
              "      <td>[-0.4508644938468933, 0.2742282450199127, -0.0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>2.0</td>\n",
              "      <td>Here we examine molecular mechanisms controlli...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>48</th>\n",
              "      <td>These results demonstrate that multiple X1 box...</td>\n",
              "      <td>[-0.3715269863605499, 0.3266661465167999, -0.3...</td>\n",
              "      <td>positive</td>\n",
              "      <td>2.0</td>\n",
              "      <td>These results demonstrate that multiple X1 box...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>49</th>\n",
              "      <td>IL-2 and IL-7 were equivalent in their ability...</td>\n",
              "      <td>[-0.7310452461242676, -0.2004045844078064, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>3.0</td>\n",
              "      <td>IL-2 and IL-7 were equivalent in their ability...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-ad4e5d09-24af-4a8c-bf04-61c7f72916ff')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-ad4e5d09-24af-4a8c-bf04-61c7f72916ff button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-ad4e5d09-24af-4a8c-bf04-61c7f72916ff');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-f36cf047-56be-4bc1-a09b-117416e64759\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-f36cf047-56be-4bc1-a09b-117416e64759')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-f36cf047-56be-4bc1-a09b-117416e64759 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 4
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lVyOE2wV0fw_"
      },
      "source": [
        "# 4. Test the fitted pipe on new example"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 150
        },
        "id": "qdCUg2MR0PD2",
        "outputId": "0ecf1df7-d5ff-4ed4-e993-9092c7870aad"
      },
      "source": [
        "fitted_pipe.predict(\"The virus had a direct impact on the nervous system\")"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "sentence_detector_dl download started this may take some time.\n",
            "Approximate size to download 354.6 KB\n",
            "[OK!]\n",
            "Warning::Spark Session already created, some configs may not take.\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                                            sentence  \\\n",
              "0  The virus had a direct impact on the nervous s...   \n",
              "\n",
              "                sentence_embedding_small_bert_L2_128 sentiment  \\\n",
              "0  [-0.4990377724170685, 0.34958764910697937, -0....  positive   \n",
              "\n",
              "  sentiment_confidence  \n",
              "0                  1.0  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-525947be-f686-471b-93f5-d270f0a1dac8\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>sentence</th>\n",
              "      <th>sentence_embedding_small_bert_L2_128</th>\n",
              "      <th>sentiment</th>\n",
              "      <th>sentiment_confidence</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>The virus had a direct impact on the nervous s...</td>\n",
              "      <td>[-0.4990377724170685, 0.34958764910697937, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-525947be-f686-471b-93f5-d270f0a1dac8')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-525947be-f686-471b-93f5-d270f0a1dac8 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-525947be-f686-471b-93f5-d270f0a1dac8');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 4
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xflpwrVjjBVD"
      },
      "source": [
        "## 5. Configure pipe training parameters"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "UtsAUGTmOTms",
        "outputId": "3d34ed83-2b77-48ab-960d-054ed7937903"
      },
      "source": [
        "trainable_pipe.print_info()"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",
            ">>> component_list['bert_sentence_embeddings@sent_small_bert_L2_128'] has settable params:\n",
            "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setBatchSize(8)              | Info: Size of every batch | Currently set to : 8\n",
            "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setEngine('tensorflow')      | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n",
            "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setIsLong(False)             | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n",
            "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setMaxSentenceLength(128)    | Info: Max sentence length to process | Currently set to : 128\n",
            "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setDimension(128)            | Info: Number of embedding dimensions | Currently set to : 128\n",
            "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setCaseSensitive(False)      | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n",
            "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128')  | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n",
            ">>> component_list['document_assembler'] has settable params:\n",
            "component_list['document_assembler'].setCleanupMode('shrink')                                  | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",
            ">>> component_list['sentiment_dl@sent_small_bert_L2_128'] has settable params:\n",
            "component_list['sentiment_dl@sent_small_bert_L2_128'].setEngine('tensorflow')                  | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n",
            "component_list['sentiment_dl@sent_small_bert_L2_128'].setThreshold(0.6)                        | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n",
            "component_list['sentiment_dl@sent_small_bert_L2_128'].setThresholdLabel('neutral')             | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n",
            "component_list['sentiment_dl@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128')  | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2GJdDNV9jEIe"
      },
      "source": [
        "## 6. Retrain with new parameters"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "mptfvHx-MMMX",
        "outputId": "1088e9d9-1d05-4b63-d07b-cb23113d9277"
      },
      "source": [
        "# Train longer!\n",
        "trainable_pipe = nlp.load('train.sentiment')\n",
        "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5)\n",
        "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n",
        "# predict with the trainable pipeline on dataset and get predictions\n",
        "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n",
        "\n",
        "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n",
        "preds.dropna(inplace=True)\n",
        "print(classification_report(preds['y'], preds['sentiment']))\n",
        "\n",
        "preds"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Warning::Spark Session already created, some configs may not take.\n",
            "Warning::Spark Session already created, some configs may not take.\n",
            "sent_small_bert_L2_128 download started this may take some time.\n",
            "Approximate size to download 16.1 MB\n",
            "[OK!]\n",
            "              precision    recall  f1-score   support\n",
            "\n",
            "    negative       0.00      0.00      0.00         2\n",
            "    positive       0.96      1.00      0.98        48\n",
            "\n",
            "    accuracy                           0.96        50\n",
            "   macro avg       0.48      0.50      0.49        50\n",
            "weighted avg       0.92      0.96      0.94        50\n",
            "\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                                             document  \\\n",
              "0   Based on nucleotide sequence requirements and ...   \n",
              "1   TINUR belongs to the NGFI-B/nur77 family of th...   \n",
              "2   In selenium-deprived Jurkat and ESb-L T lympho...   \n",
              "3   These findings demonstrate that IFNs inhibit I...   \n",
              "4   These data reveal the presence of distinct com...   \n",
              "5   The translated protein showed weak DNA binding...   \n",
              "6   All tumor cell lines from the B-cell lineage a...   \n",
              "7   GABP factors bind to a distal interleukin 2 (I...   \n",
              "8   In addition, Tax also stimulates the transcrip...   \n",
              "9   Mutation of the TCF-1 alpha binding site dimin...   \n",
              "10  They involve phosphorylation and proteolytic r...   \n",
              "11  Activation of the transcription factor NF-kapp...   \n",
              "12  Neutrophil accumulation and development of lun...   \n",
              "13  Mutations in these binding sites can interfere...   \n",
              "14  To understand the molecular mechanisms of func...   \n",
              "15  Collectively, these results suggest that HOCl ...   \n",
              "16  However, mutation of the AP-1 site markedly di...   \n",
              "17  A novel HIV-1 isolate containing alterations a...   \n",
              "18  These findings suggest that Zp responds direct...   \n",
              "19  Several distinct roles for hsp90 in modulating...   \n",
              "20  In addition to activation of phospholipase C g...   \n",
              "21  Also in the current study, binding activity to...   \n",
              "22  These alterations of transcription factors are...   \n",
              "23  Epstein-Barr virus nuclear antigen 2 and laten...   \n",
              "24  Here, we present the isolation and characteriz...   \n",
              "25  The signaling capabilities of the IL-10R for a...   \n",
              "26  To identify potential cellular homologues of c...   \n",
              "27  In addition, no CIITA protein is detectable in...   \n",
              "28  The expression of AP-1 depended on calcium mob...   \n",
              "29  Nuclear accumulation of NFAT4 opposed by the J...   \n",
              "30  This resistance to apoptosis is reversed by an...   \n",
              "31  During recent years, studies of insulin-gene r...   \n",
              "32  The NFAT protein migrated more slowly in a sod...   \n",
              "33  The effect of DM on expression of IL-2R alpha ...   \n",
              "34  We conclude that TNF-alpha bioavailability and...   \n",
              "35  The two isozymes show little amino acid identi...   \n",
              "36  Intriguingly, surface expression of LT-alpha1b...   \n",
              "37  Binding of the drug inhibits isomerase activit...   \n",
              "38  These genes may then play a role in altering t...   \n",
              "39                     Copyright 1999 Academic Press.   \n",
              "40  Direct exposure to 10 nM 2,3,7,8-TCDD caused a...   \n",
              "41  However, activation of the T cell lines leadin...   \n",
              "42  The defensin sensitivities of Salmonella typhi...   \n",
              "43  Receptors for the Fc portion of immunoglobulin...   \n",
              "44  We have isolated a novel cDNA clone encoding i...   \n",
              "45  IL-10 inhibitory activity is exerted on T lymp...   \n",
              "46  Furthermore, CD40 ligation of a HLA-A2+, Melan...   \n",
              "47  We conclude that interactions between TAFII32 ...   \n",
              "48  These cDNA were 2343 bp long and their transcr...   \n",
              "49  In vitro studies using pure recombinant p21ras...   \n",
              "\n",
              "                 sentence_embedding_small_bert_L2_128 sentiment  \\\n",
              "0   [-0.49277549982070923, 0.09530887007713318, -0...  positive   \n",
              "1   [-0.10481061786413193, 0.1171066015958786, -0....  positive   \n",
              "2   [-1.0812174081802368, 0.5883667469024658, -0.4...  positive   \n",
              "3   [-0.9547467231750488, -0.15689292550086975, -0...  positive   \n",
              "4   [-0.4628618657588959, 0.06154884025454521, -0....  positive   \n",
              "5   [-0.3139030635356903, -0.15748938918113708, -0...  positive   \n",
              "6   [0.20084746181964874, -0.4846010208129883, -0....  positive   \n",
              "7   [-0.26659855246543884, 0.2846565246582031, -1....  positive   \n",
              "8   [-0.822009801864624, 0.6354378461837769, -0.34...  positive   \n",
              "9   [-0.6489402651786804, 0.1254355013370514, -0.5...  positive   \n",
              "10  [-0.8970018029212952, -0.5171773433685303, -0....  positive   \n",
              "11  [-0.5248922109603882, 0.24680814146995544, -0....  positive   \n",
              "12  [-1.0332255363464355, 0.5459337830543518, -1.1...  positive   \n",
              "13  [-0.11175256222486496, 0.06691665202379227, -0...  positive   \n",
              "14  [-0.4407936632633209, 0.782042920589447, -0.50...  positive   \n",
              "15  [-0.6647999882698059, 0.3089727759361267, -0.4...  positive   \n",
              "16  [-0.9864672422409058, 0.05116121098399162, -0....  positive   \n",
              "17  [-0.18733267486095428, 0.06821992248296738, -0...  positive   \n",
              "18  [-0.7554842829704285, -0.054939232766628265, -...  positive   \n",
              "19  [-0.12544415891170502, 0.2649071216583252, -0....  positive   \n",
              "20  [-0.5602053999900818, -0.5014104843139648, -0....  positive   \n",
              "21  [-0.9772286415100098, 0.048676762729883194, -0...  positive   \n",
              "22  [-0.3631213307380676, 0.2459881603717804, -0.8...  positive   \n",
              "23  [-0.1847376972436905, -0.007672342471778393, -...  positive   \n",
              "24  [-0.14775823056697845, -0.014818106777966022, ...  positive   \n",
              "25  [-0.506297767162323, 0.44844114780426025, -0.5...  positive   \n",
              "26  [0.001409502699971199, 0.13168536126613617, -0...  positive   \n",
              "27  [0.09807377308607101, -0.04232050105929375, -0...  positive   \n",
              "28  [-0.9711195826530457, -0.15560954809188843, -0...  positive   \n",
              "29  [-0.22911998629570007, 0.4173714518547058, -0....  positive   \n",
              "30  [-0.3486934304237366, 0.22132794559001923, -0....  positive   \n",
              "31  [-0.7870786786079407, 0.4678671956062317, -0.9...  positive   \n",
              "32  [0.048784464597702026, -0.1957472562789917, -0...  positive   \n",
              "33  [-0.37844932079315186, 0.12598302960395813, -0...  positive   \n",
              "34  [-0.5289692878723145, 0.7431911826133728, -0.7...  positive   \n",
              "35  [0.14430226385593414, -0.674239993095398, -0.7...  positive   \n",
              "36  [-0.2223060429096222, -0.15909825265407562, -0...  positive   \n",
              "37  [-0.6680501699447632, 0.34882932901382446, -0....  positive   \n",
              "38  [-0.8830984830856323, 0.1207333579659462, -0.6...  positive   \n",
              "39  [-1.021697759628296, 0.9163565635681152, -0.25...  positive   \n",
              "40  [-0.8829392194747925, 0.09077997505664825, -0....  positive   \n",
              "41  [-0.2961618900299072, -0.22415119409561157, -0...  positive   \n",
              "42  [-0.4785139262676239, 0.172796368598938, -0.22...  positive   \n",
              "43  [-0.44332364201545715, -0.3984912037849426, -0...  positive   \n",
              "44  [0.11249734461307526, -0.1219027042388916, -0....  positive   \n",
              "45  [-0.07616603374481201, 0.030630899593234062, -...  positive   \n",
              "46  [-0.3785442113876343, 0.29912295937538147, -0....  positive   \n",
              "47  [-0.4170002043247223, -0.09449539333581924, -0...  positive   \n",
              "48  [-0.9243519306182861, 0.7070343494415283, -0.3...  positive   \n",
              "49  [-0.6335657238960266, 0.15302662551403046, -0....  positive   \n",
              "\n",
              "   sentiment_confidence                                               text  \\\n",
              "0                   1.0  Based on nucleotide sequence requirements and ...   \n",
              "1                   1.0  TINUR belongs to the NGFI-B/nur77 family of th...   \n",
              "2                   1.0  In selenium-deprived Jurkat and ESb-L T lympho...   \n",
              "3                   1.0  These findings demonstrate that IFNs inhibit I...   \n",
              "4                   1.0  These data reveal the presence of distinct com...   \n",
              "5                   1.0  The translated protein showed weak DNA binding...   \n",
              "6                   1.0  All tumor cell lines from the B-cell lineage a...   \n",
              "7                   1.0  GABP factors bind to a distal interleukin 2 (I...   \n",
              "8                   1.0  In addition, Tax also stimulates the transcrip...   \n",
              "9                   1.0  Mutation of the TCF-1 alpha binding site dimin...   \n",
              "10                  1.0  They involve phosphorylation and proteolytic r...   \n",
              "11                  1.0  Activation of the transcription factor NF-kapp...   \n",
              "12                  1.0  Neutrophil accumulation and development of lun...   \n",
              "13                  1.0  Mutations in these binding sites can interfere...   \n",
              "14                  1.0  To understand the molecular mechanisms of func...   \n",
              "15                  1.0  Collectively, these results suggest that HOCl ...   \n",
              "16                  1.0  However, mutation of the AP-1 site markedly di...   \n",
              "17                  1.0  A novel HIV-1 isolate containing alterations a...   \n",
              "18                  1.0  These findings suggest that Zp responds direct...   \n",
              "19                  1.0  Several distinct roles for hsp90 in modulating...   \n",
              "20                  1.0  In addition to activation of phospholipase C g...   \n",
              "21                  1.0  Also in the current study, binding activity to...   \n",
              "22                  1.0  These alterations of transcription factors are...   \n",
              "23                  1.0  Epstein-Barr virus nuclear antigen 2 and laten...   \n",
              "24                  1.0  Here, we present the isolation and characteriz...   \n",
              "25                  1.0  The signaling capabilities of the IL-10R for a...   \n",
              "26                  1.0  To identify potential cellular homologues of c...   \n",
              "27                  1.0  In addition, no CIITA protein is detectable in...   \n",
              "28                  1.0  The expression of AP-1 depended on calcium mob...   \n",
              "29                  1.0  Nuclear accumulation of NFAT4 opposed by the J...   \n",
              "30                  1.0  This resistance to apoptosis is reversed by an...   \n",
              "31                  1.0  During recent years, studies of insulin-gene r...   \n",
              "32                  1.0  The NFAT protein migrated more slowly in a sod...   \n",
              "33                  1.0  The effect of DM on expression of IL-2R alpha ...   \n",
              "34                  1.0  We conclude that TNF-alpha bioavailability and...   \n",
              "35                  1.0  The two isozymes show little amino acid identi...   \n",
              "36                  1.0  Intriguingly, surface expression of LT-alpha1b...   \n",
              "37                  1.0  Binding of the drug inhibits isomerase activit...   \n",
              "38                  1.0  These genes may then play a role in altering t...   \n",
              "39                  1.0                     Copyright 1999 Academic Press.   \n",
              "40                  1.0  Direct exposure to 10 nM 2,3,7,8-TCDD caused a...   \n",
              "41                  1.0  However, activation of the T cell lines leadin...   \n",
              "42                  1.0  The defensin sensitivities of Salmonella typhi...   \n",
              "43                  1.0  Receptors for the Fc portion of immunoglobulin...   \n",
              "44                  1.0  We have isolated a novel cDNA clone encoding i...   \n",
              "45                  1.0  IL-10 inhibitory activity is exerted on T lymp...   \n",
              "46                  1.0  Furthermore, CD40 ligation of a HLA-A2+, Melan...   \n",
              "47                  1.0  We conclude that interactions between TAFII32 ...   \n",
              "48                  1.0  These cDNA were 2343 bp long and their transcr...   \n",
              "49                  1.0  In vitro studies using pure recombinant p21ras...   \n",
              "\n",
              "           y  \n",
              "0   positive  \n",
              "1   positive  \n",
              "2   positive  \n",
              "3   positive  \n",
              "4   positive  \n",
              "5   positive  \n",
              "6   positive  \n",
              "7   positive  \n",
              "8   positive  \n",
              "9   positive  \n",
              "10  positive  \n",
              "11  positive  \n",
              "12  positive  \n",
              "13  positive  \n",
              "14  positive  \n",
              "15  positive  \n",
              "16  positive  \n",
              "17  positive  \n",
              "18  positive  \n",
              "19  positive  \n",
              "20  positive  \n",
              "21  positive  \n",
              "22  positive  \n",
              "23  positive  \n",
              "24  positive  \n",
              "25  positive  \n",
              "26  positive  \n",
              "27  negative  \n",
              "28  positive  \n",
              "29  positive  \n",
              "30  positive  \n",
              "31  positive  \n",
              "32  positive  \n",
              "33  positive  \n",
              "34  positive  \n",
              "35  positive  \n",
              "36  positive  \n",
              "37  negative  \n",
              "38  positive  \n",
              "39  positive  \n",
              "40  positive  \n",
              "41  positive  \n",
              "42  positive  \n",
              "43  positive  \n",
              "44  positive  \n",
              "45  positive  \n",
              "46  positive  \n",
              "47  positive  \n",
              "48  positive  \n",
              "49  positive  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-afc38e5e-089f-431b-b68a-726ddae25361\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>document</th>\n",
              "      <th>sentence_embedding_small_bert_L2_128</th>\n",
              "      <th>sentiment</th>\n",
              "      <th>sentiment_confidence</th>\n",
              "      <th>text</th>\n",
              "      <th>y</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Based on nucleotide sequence requirements and ...</td>\n",
              "      <td>[-0.49277549982070923, 0.09530887007713318, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Based on nucleotide sequence requirements and ...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>TINUR belongs to the NGFI-B/nur77 family of th...</td>\n",
              "      <td>[-0.10481061786413193, 0.1171066015958786, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>TINUR belongs to the NGFI-B/nur77 family of th...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>In selenium-deprived Jurkat and ESb-L T lympho...</td>\n",
              "      <td>[-1.0812174081802368, 0.5883667469024658, -0.4...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>In selenium-deprived Jurkat and ESb-L T lympho...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>These findings demonstrate that IFNs inhibit I...</td>\n",
              "      <td>[-0.9547467231750488, -0.15689292550086975, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>These findings demonstrate that IFNs inhibit I...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>These data reveal the presence of distinct com...</td>\n",
              "      <td>[-0.4628618657588959, 0.06154884025454521, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>These data reveal the presence of distinct com...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>The translated protein showed weak DNA binding...</td>\n",
              "      <td>[-0.3139030635356903, -0.15748938918113708, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>The translated protein showed weak DNA binding...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>All tumor cell lines from the B-cell lineage a...</td>\n",
              "      <td>[0.20084746181964874, -0.4846010208129883, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>All tumor cell lines from the B-cell lineage a...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>GABP factors bind to a distal interleukin 2 (I...</td>\n",
              "      <td>[-0.26659855246543884, 0.2846565246582031, -1....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>GABP factors bind to a distal interleukin 2 (I...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>In addition, Tax also stimulates the transcrip...</td>\n",
              "      <td>[-0.822009801864624, 0.6354378461837769, -0.34...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>In addition, Tax also stimulates the transcrip...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>Mutation of the TCF-1 alpha binding site dimin...</td>\n",
              "      <td>[-0.6489402651786804, 0.1254355013370514, -0.5...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Mutation of the TCF-1 alpha binding site dimin...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>10</th>\n",
              "      <td>They involve phosphorylation and proteolytic r...</td>\n",
              "      <td>[-0.8970018029212952, -0.5171773433685303, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>They involve phosphorylation and proteolytic r...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11</th>\n",
              "      <td>Activation of the transcription factor NF-kapp...</td>\n",
              "      <td>[-0.5248922109603882, 0.24680814146995544, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Activation of the transcription factor NF-kapp...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>12</th>\n",
              "      <td>Neutrophil accumulation and development of lun...</td>\n",
              "      <td>[-1.0332255363464355, 0.5459337830543518, -1.1...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Neutrophil accumulation and development of lun...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>13</th>\n",
              "      <td>Mutations in these binding sites can interfere...</td>\n",
              "      <td>[-0.11175256222486496, 0.06691665202379227, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Mutations in these binding sites can interfere...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>14</th>\n",
              "      <td>To understand the molecular mechanisms of func...</td>\n",
              "      <td>[-0.4407936632633209, 0.782042920589447, -0.50...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>To understand the molecular mechanisms of func...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>15</th>\n",
              "      <td>Collectively, these results suggest that HOCl ...</td>\n",
              "      <td>[-0.6647999882698059, 0.3089727759361267, -0.4...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Collectively, these results suggest that HOCl ...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>16</th>\n",
              "      <td>However, mutation of the AP-1 site markedly di...</td>\n",
              "      <td>[-0.9864672422409058, 0.05116121098399162, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>However, mutation of the AP-1 site markedly di...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17</th>\n",
              "      <td>A novel HIV-1 isolate containing alterations a...</td>\n",
              "      <td>[-0.18733267486095428, 0.06821992248296738, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>A novel HIV-1 isolate containing alterations a...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>18</th>\n",
              "      <td>These findings suggest that Zp responds direct...</td>\n",
              "      <td>[-0.7554842829704285, -0.054939232766628265, -...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>These findings suggest that Zp responds direct...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>19</th>\n",
              "      <td>Several distinct roles for hsp90 in modulating...</td>\n",
              "      <td>[-0.12544415891170502, 0.2649071216583252, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Several distinct roles for hsp90 in modulating...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>20</th>\n",
              "      <td>In addition to activation of phospholipase C g...</td>\n",
              "      <td>[-0.5602053999900818, -0.5014104843139648, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>In addition to activation of phospholipase C g...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>21</th>\n",
              "      <td>Also in the current study, binding activity to...</td>\n",
              "      <td>[-0.9772286415100098, 0.048676762729883194, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Also in the current study, binding activity to...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>22</th>\n",
              "      <td>These alterations of transcription factors are...</td>\n",
              "      <td>[-0.3631213307380676, 0.2459881603717804, -0.8...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>These alterations of transcription factors are...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>23</th>\n",
              "      <td>Epstein-Barr virus nuclear antigen 2 and laten...</td>\n",
              "      <td>[-0.1847376972436905, -0.007672342471778393, -...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Epstein-Barr virus nuclear antigen 2 and laten...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>24</th>\n",
              "      <td>Here, we present the isolation and characteriz...</td>\n",
              "      <td>[-0.14775823056697845, -0.014818106777966022, ...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Here, we present the isolation and characteriz...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>25</th>\n",
              "      <td>The signaling capabilities of the IL-10R for a...</td>\n",
              "      <td>[-0.506297767162323, 0.44844114780426025, -0.5...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>The signaling capabilities of the IL-10R for a...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>26</th>\n",
              "      <td>To identify potential cellular homologues of c...</td>\n",
              "      <td>[0.001409502699971199, 0.13168536126613617, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>To identify potential cellular homologues of c...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>27</th>\n",
              "      <td>In addition, no CIITA protein is detectable in...</td>\n",
              "      <td>[0.09807377308607101, -0.04232050105929375, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>In addition, no CIITA protein is detectable in...</td>\n",
              "      <td>negative</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>28</th>\n",
              "      <td>The expression of AP-1 depended on calcium mob...</td>\n",
              "      <td>[-0.9711195826530457, -0.15560954809188843, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>The expression of AP-1 depended on calcium mob...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>29</th>\n",
              "      <td>Nuclear accumulation of NFAT4 opposed by the J...</td>\n",
              "      <td>[-0.22911998629570007, 0.4173714518547058, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Nuclear accumulation of NFAT4 opposed by the J...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>30</th>\n",
              "      <td>This resistance to apoptosis is reversed by an...</td>\n",
              "      <td>[-0.3486934304237366, 0.22132794559001923, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>This resistance to apoptosis is reversed by an...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>31</th>\n",
              "      <td>During recent years, studies of insulin-gene r...</td>\n",
              "      <td>[-0.7870786786079407, 0.4678671956062317, -0.9...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>During recent years, studies of insulin-gene r...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>32</th>\n",
              "      <td>The NFAT protein migrated more slowly in a sod...</td>\n",
              "      <td>[0.048784464597702026, -0.1957472562789917, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>The NFAT protein migrated more slowly in a sod...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>33</th>\n",
              "      <td>The effect of DM on expression of IL-2R alpha ...</td>\n",
              "      <td>[-0.37844932079315186, 0.12598302960395813, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>The effect of DM on expression of IL-2R alpha ...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>34</th>\n",
              "      <td>We conclude that TNF-alpha bioavailability and...</td>\n",
              "      <td>[-0.5289692878723145, 0.7431911826133728, -0.7...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>We conclude that TNF-alpha bioavailability and...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>35</th>\n",
              "      <td>The two isozymes show little amino acid identi...</td>\n",
              "      <td>[0.14430226385593414, -0.674239993095398, -0.7...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>The two isozymes show little amino acid identi...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>36</th>\n",
              "      <td>Intriguingly, surface expression of LT-alpha1b...</td>\n",
              "      <td>[-0.2223060429096222, -0.15909825265407562, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Intriguingly, surface expression of LT-alpha1b...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>37</th>\n",
              "      <td>Binding of the drug inhibits isomerase activit...</td>\n",
              "      <td>[-0.6680501699447632, 0.34882932901382446, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Binding of the drug inhibits isomerase activit...</td>\n",
              "      <td>negative</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>38</th>\n",
              "      <td>These genes may then play a role in altering t...</td>\n",
              "      <td>[-0.8830984830856323, 0.1207333579659462, -0.6...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>These genes may then play a role in altering t...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>39</th>\n",
              "      <td>Copyright 1999 Academic Press.</td>\n",
              "      <td>[-1.021697759628296, 0.9163565635681152, -0.25...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Copyright 1999 Academic Press.</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>40</th>\n",
              "      <td>Direct exposure to 10 nM 2,3,7,8-TCDD caused a...</td>\n",
              "      <td>[-0.8829392194747925, 0.09077997505664825, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Direct exposure to 10 nM 2,3,7,8-TCDD caused a...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>41</th>\n",
              "      <td>However, activation of the T cell lines leadin...</td>\n",
              "      <td>[-0.2961618900299072, -0.22415119409561157, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>However, activation of the T cell lines leadin...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>42</th>\n",
              "      <td>The defensin sensitivities of Salmonella typhi...</td>\n",
              "      <td>[-0.4785139262676239, 0.172796368598938, -0.22...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>The defensin sensitivities of Salmonella typhi...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>43</th>\n",
              "      <td>Receptors for the Fc portion of immunoglobulin...</td>\n",
              "      <td>[-0.44332364201545715, -0.3984912037849426, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Receptors for the Fc portion of immunoglobulin...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>44</th>\n",
              "      <td>We have isolated a novel cDNA clone encoding i...</td>\n",
              "      <td>[0.11249734461307526, -0.1219027042388916, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>We have isolated a novel cDNA clone encoding i...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>45</th>\n",
              "      <td>IL-10 inhibitory activity is exerted on T lymp...</td>\n",
              "      <td>[-0.07616603374481201, 0.030630899593234062, -...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>IL-10 inhibitory activity is exerted on T lymp...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>46</th>\n",
              "      <td>Furthermore, CD40 ligation of a HLA-A2+, Melan...</td>\n",
              "      <td>[-0.3785442113876343, 0.29912295937538147, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>Furthermore, CD40 ligation of a HLA-A2+, Melan...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>47</th>\n",
              "      <td>We conclude that interactions between TAFII32 ...</td>\n",
              "      <td>[-0.4170002043247223, -0.09449539333581924, -0...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>We conclude that interactions between TAFII32 ...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>48</th>\n",
              "      <td>These cDNA were 2343 bp long and their transcr...</td>\n",
              "      <td>[-0.9243519306182861, 0.7070343494415283, -0.3...</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>These cDNA were 2343 bp long and their transcr...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>49</th>\n",
              "      <td>In vitro studies using pure recombinant p21ras...</td>\n",
              "      <td>[-0.6335657238960266, 0.15302662551403046, -0....</td>\n",
              "      <td>positive</td>\n",
              "      <td>1.0</td>\n",
              "      <td>In vitro studies using pure recombinant p21ras...</td>\n",
              "      <td>positive</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-afc38e5e-089f-431b-b68a-726ddae25361')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-afc38e5e-089f-431b-b68a-726ddae25361 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-afc38e5e-089f-431b-b68a-726ddae25361');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-aa0d0664-4098-4383-9201-137bd83cbd79\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-aa0d0664-4098-4383-9201-137bd83cbd79')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-aa0d0664-4098-4383-9201-137bd83cbd79 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 6
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qFoT-s1MjTSS"
      },
      "source": [
        "# 7. Try training with different Embeddings"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "nxWFzQOhjWC8",
        "outputId": "38055466-53e0-462c-91da-be2baf02b48f"
      },
      "source": [
        "# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n",
        "nlp.nlu.print_components(action='embed_sentence')"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "For language <am> NLU provides the following Models : \n",
            "nlu.load('am.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_amharic\n",
            "For language <de> NLU provides the following Models : \n",
            "nlu.load('de.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n",
            "For language <el> NLU provides the following Models : \n",
            "nlu.load('el.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n",
            "For language <en> NLU provides the following Models : \n",
            "nlu.load('en.embed_sentence') returns Spark NLP model_anno_obj tfhub_use\n",
            "nlu.load('en.embed_sentence.albert') returns Spark NLP model_anno_obj albert_base_uncased\n",
            "nlu.load('en.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_base_uncased\n",
            "nlu.load('en.embed_sentence.bert.base_uncased_legal') returns Spark NLP model_anno_obj sent_bert_base_uncased_legal\n",
            "nlu.load('en.embed_sentence.bert.finetuned') returns Spark NLP model_anno_obj sbert_setfit_finetuned_financial_text_classification\n",
            "nlu.load('en.embed_sentence.bert.pubmed') returns Spark NLP model_anno_obj sent_bert_pubmed\n",
            "nlu.load('en.embed_sentence.bert.pubmed_squad2') returns Spark NLP model_anno_obj sent_bert_pubmed_squad2\n",
            "nlu.load('en.embed_sentence.bert.wiki_books') returns Spark NLP model_anno_obj sent_bert_wiki_books\n",
            "nlu.load('en.embed_sentence.bert.wiki_books_mnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_mnli\n",
            "nlu.load('en.embed_sentence.bert.wiki_books_qnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_qnli\n",
            "nlu.load('en.embed_sentence.bert.wiki_books_qqp') returns Spark NLP model_anno_obj sent_bert_wiki_books_qqp\n",
            "nlu.load('en.embed_sentence.bert.wiki_books_squad2') returns Spark NLP model_anno_obj sent_bert_wiki_books_squad2\n",
            "nlu.load('en.embed_sentence.bert.wiki_books_sst2') returns Spark NLP model_anno_obj sent_bert_wiki_books_sst2\n",
            "nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n",
            "nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n",
            "nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model_anno_obj sent_bert_large_cased\n",
            "nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model_anno_obj sent_bert_large_uncased\n",
            "nlu.load('en.embed_sentence.bert_use_cmlm_en_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_base\n",
            "nlu.load('en.embed_sentence.bert_use_cmlm_en_large') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_large\n",
            "nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model_anno_obj sent_biobert_clinical_base_cased\n",
            "nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model_anno_obj sent_biobert_discharge_base_cased\n",
            "nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pmc_base_cased\n",
            "nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_base_cased\n",
            "nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_large_cased\n",
            "nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_pmc_base_cased\n",
            "nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model_anno_obj sent_covidbert_large_uncased\n",
            "nlu.load('en.embed_sentence.distil_roberta.distilled_base') returns Spark NLP model_anno_obj sent_distilroberta_base\n",
            "nlu.load('en.embed_sentence.doc2vec') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n",
            "nlu.load('en.embed_sentence.doc2vec.gigaword_300') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n",
            "nlu.load('en.embed_sentence.doc2vec.gigaword_wiki_300') returns Spark NLP model_anno_obj doc2vec_gigaword_wiki_300\n",
            "nlu.load('en.embed_sentence.electra') returns Spark NLP model_anno_obj sent_electra_small_uncased\n",
            "nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model_anno_obj sent_electra_base_uncased\n",
            "nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model_anno_obj sent_electra_large_uncased\n",
            "nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model_anno_obj sent_electra_small_uncased\n",
            "nlu.load('en.embed_sentence.roberta.base') returns Spark NLP model_anno_obj sent_roberta_base\n",
            "nlu.load('en.embed_sentence.roberta.large') returns Spark NLP model_anno_obj sent_roberta_large\n",
            "nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model_anno_obj sent_small_bert_L10_128\n",
            "nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model_anno_obj sent_small_bert_L10_256\n",
            "nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model_anno_obj sent_small_bert_L10_512\n",
            "nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model_anno_obj sent_small_bert_L10_768\n",
            "nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model_anno_obj sent_small_bert_L12_128\n",
            "nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model_anno_obj sent_small_bert_L12_256\n",
            "nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model_anno_obj sent_small_bert_L12_512\n",
            "nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model_anno_obj sent_small_bert_L12_768\n",
            "nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model_anno_obj sent_small_bert_L2_128\n",
            "nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model_anno_obj sent_small_bert_L2_256\n",
            "nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model_anno_obj sent_small_bert_L2_512\n",
            "nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model_anno_obj sent_small_bert_L2_768\n",
            "nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model_anno_obj sent_small_bert_L4_128\n",
            "nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model_anno_obj sent_small_bert_L4_256\n",
            "nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model_anno_obj sent_small_bert_L4_512\n",
            "nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model_anno_obj sent_small_bert_L4_768\n",
            "nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model_anno_obj sent_small_bert_L6_128\n",
            "nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model_anno_obj sent_small_bert_L6_256\n",
            "nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model_anno_obj sent_small_bert_L6_512\n",
            "nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model_anno_obj sent_small_bert_L6_768\n",
            "nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model_anno_obj sent_small_bert_L8_128\n",
            "nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model_anno_obj sent_small_bert_L8_256\n",
            "nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model_anno_obj sent_small_bert_L8_512\n",
            "nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model_anno_obj sent_small_bert_L8_768\n",
            "nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model_anno_obj tfhub_use\n",
            "nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n",
            "nlu.load('en.embed_sentence.use') returns Spark NLP model_anno_obj tfhub_use\n",
            "nlu.load('en.embed_sentence.use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n",
            "For language <es> NLU provides the following Models : \n",
            "nlu.load('es.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n",
            "nlu.load('es.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n",
            "For language <fi> NLU provides the following Models : \n",
            "nlu.load('fi.embed_sentence.bert') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n",
            "nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model_anno_obj bert_base_finnish_cased\n",
            "nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n",
            "For language <ha> NLU provides the following Models : \n",
            "nlu.load('ha.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_hausa\n",
            "For language <ig> NLU provides the following Models : \n",
            "nlu.load('ig.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_igbo\n",
            "For language <lg> NLU provides the following Models : \n",
            "nlu.load('lg.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_luganda\n",
            "For language <nl> NLU provides the following Models : \n",
            "nlu.load('nl.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n",
            "For language <pcm> NLU provides the following Models : \n",
            "nlu.load('pcm.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_naija\n",
            "For language <pt> NLU provides the following Models : \n",
            "nlu.load('pt.embed_sentence.bert.base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_base_tsdae_sts\n",
            "nlu.load('pt.embed_sentence.bert.cased_large_legal') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.1\n",
            "nlu.load('pt.embed_sentence.bert.large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_gpl_sts\n",
            "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.10.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.10\n",
            "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.2.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.2\n",
            "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.3.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.3\n",
            "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.4.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.4\n",
            "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.5.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.5\n",
            "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.7.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.7\n",
            "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.8.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.8\n",
            "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.9.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.9\n",
            "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v1.0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v1.0\n",
            "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v0\n",
            "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v1\n",
            "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v0\n",
            "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v1\n",
            "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v0\n",
            "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v1\n",
            "nlu.load('pt.embed_sentence.bert.v2_base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma_v2\n",
            "nlu.load('pt.embed_sentence.bert.v2_large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v2\n",
            "nlu.load('pt.embed_sentence.bertimbau.legal.assin.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma\n",
            "nlu.load('pt.embed_sentence.bertimbau.legal.assin2.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base\n",
            "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large\n",
            "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma\n",
            "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma_v3.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma_v3\n",
            "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts\n",
            "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts_v4.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v4\n",
            "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_v4_gpl_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_v4_gpl_sts\n",
            "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_sts_v2.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_v2\n",
            "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_v2_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_v2_sts\n",
            "For language <rw> NLU provides the following Models : \n",
            "nlu.load('rw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_kinyarwanda\n",
            "For language <sv> NLU provides the following Models : \n",
            "nlu.load('sv.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n",
            "For language <sw> NLU provides the following Models : \n",
            "nlu.load('sw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_swahili\n",
            "For language <wo> NLU provides the following Models : \n",
            "nlu.load('wo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_wolof\n",
            "For language <xx> NLU provides the following Models : \n",
            "nlu.load('xx.embed_sentence') returns Spark NLP model_anno_obj sent_bert_multi_cased\n",
            "nlu.load('xx.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_multi_cased\n",
            "nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model_anno_obj sent_bert_multi_cased\n",
            "nlu.load('xx.embed_sentence.bert.muril') returns Spark NLP model_anno_obj sent_bert_muril\n",
            "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base\n",
            "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base_br') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base_br\n",
            "nlu.load('xx.embed_sentence.labse') returns Spark NLP model_anno_obj labse\n",
            "nlu.load('xx.embed_sentence.xlm_roberta.base') returns Spark NLP model_anno_obj sent_xlm_roberta_base\n",
            "For language <yo> NLU provides the following Models : \n",
            "nlu.load('yo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_yoruba\n",
            "For language <zh> NLU provides the following Models : \n",
            "nlu.load('zh.embed_sentence.bert') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1\n",
            "nlu.load('zh.embed_sentence.bert.distilled') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1_distill\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "IKK_Ii_gjJfF",
        "outputId": "6c32478b-c999-499f-ca4a-38a992c4e950"
      },
      "source": [
        "trainable_pipe = nlp.load('en.embed_sentence.small_bert_L12_128 train.sentiment')\n",
        "# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n",
        "# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n",
        "# Also longer training gives more accuracy\n",
        "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(120)\n",
        "trainable_pipe['trainable_sentiment_dl'].setLr(0.0005)\n",
        "fitted_pipe = trainable_pipe.fit(train_df[:1000])\n",
        "\n",
        "# predict with the trainable pipeline on dataset and get predictions\n",
        "preds = fitted_pipe.predict(train_df[:1000],output_level='document')\n",
        "\n",
        "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n",
        "preds.dropna(inplace=True)\n",
        "print(classification_report(preds['y'], preds['sentiment']))\n",
        "\n",
        "#preds"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Warning::Spark Session already created, some configs may not take.\n",
            "Warning::Spark Session already created, some configs may not take.\n",
            "sent_small_bert_L12_128 download started this may take some time.\n",
            "Approximate size to download 23.4 MB\n",
            "[OK!]\n",
            "              precision    recall  f1-score   support\n",
            "\n",
            "    negative       0.00      0.00      0.00       132\n",
            "    positive       0.87      1.00      0.93       868\n",
            "\n",
            "    accuracy                           0.87      1000\n",
            "   macro avg       0.43      0.50      0.46      1000\n",
            "weighted avg       0.75      0.87      0.81      1000\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_1jxw3GnVGlI"
      },
      "source": [
        "# 7.1 evaluate on Test Data"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Fxx4yNkNVGFl",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "d01316a0-0e7c-4155-e8b0-8b28715aa921"
      },
      "source": [
        "preds = fitted_pipe.predict(test_df,output_level='document')\n",
        "\n",
        "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n",
        "preds.dropna(inplace=True)\n",
        "print(classification_report(preds['y'], preds['sentiment']))"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "              precision    recall  f1-score   support\n",
            "\n",
            "    negative       0.00      0.00      0.00       348\n",
            "    positive       0.85      1.00      0.92      2051\n",
            "\n",
            "    accuracy                           0.85      2399\n",
            "   macro avg       0.43      0.50      0.46      2399\n",
            "weighted avg       0.73      0.85      0.79      2399\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2BB-NwZUoHSe"
      },
      "source": [
        "# 8. Lets save the model"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "eLex095goHwm"
      },
      "source": [
        "stored_model_path = './models/classifier_dl_trained'\n",
        "fitted_pipe.save(stored_model_path)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e_b2DPd4rCiU"
      },
      "source": [
        "# 9. Lets load the model from HDD.\n",
        "This makes Offlien NLU usage possible!   \n",
        "You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 133
        },
        "id": "SO4uz45MoRgp",
        "outputId": "e4464877-4bd1-469f-cb1b-651aea1e9bec"
      },
      "source": [
        "hdd_pipe = nlp.load(path=stored_model_path)\n",
        "\n",
        "preds = hdd_pipe.predict('The virus had a direct impact on the nervous system')\n",
        "preds"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Warning::Spark Session already created, some configs may not take.\n",
            "Warning::Spark Session already created, some configs may not take.\n",
            "Warning::Spark Session already created, some configs may not take.\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                                            document  \\\n",
              "0  The virus had a direct impact on the nervous s...   \n",
              "\n",
              "                        sentence_embedding_from_disk sentiment  \\\n",
              "0  [0.6362331509590149, 0.006696224212646484, 0.2...  positive   \n",
              "\n",
              "  sentiment_confidence  \n",
              "0                  4.0  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-b1094a44-7a8b-4a88-9a52-3fc3309bb9e2\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>document</th>\n",
              "      <th>sentence_embedding_from_disk</th>\n",
              "      <th>sentiment</th>\n",
              "      <th>sentiment_confidence</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>The virus had a direct impact on the nervous s...</td>\n",
              "      <td>[0.6362331509590149, 0.006696224212646484, 0.2...</td>\n",
              "      <td>positive</td>\n",
              "      <td>4.0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-b1094a44-7a8b-4a88-9a52-3fc3309bb9e2')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-b1094a44-7a8b-4a88-9a52-3fc3309bb9e2 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-b1094a44-7a8b-4a88-9a52-3fc3309bb9e2');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 37
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "e0CVlkk9v6Qi",
        "outputId": "1151c44d-6bc8-4c03-95b9-f495cd366a94"
      },
      "source": [
        "hdd_pipe.print_info()"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",
            ">>> component_list['document_assembler'] has settable params:\n",
            "component_list['document_assembler'].setCleanupMode('shrink')                                    | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",
            ">>> component_list['bert_sentence_embeddings@sent_small_bert_L12_128'] has settable params:\n",
            "component_list['bert_sentence_embeddings@sent_small_bert_L12_128'].setBatchSize(8)               | Info: Size of every batch | Currently set to : 8\n",
            "component_list['bert_sentence_embeddings@sent_small_bert_L12_128'].setCaseSensitive(False)       | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n",
            "component_list['bert_sentence_embeddings@sent_small_bert_L12_128'].setDimension(128)             | Info: Number of embedding dimensions | Currently set to : 128\n",
            "component_list['bert_sentence_embeddings@sent_small_bert_L12_128'].setMaxSentenceLength(128)     | Info: Max sentence length to process | Currently set to : 128\n",
            "component_list['bert_sentence_embeddings@sent_small_bert_L12_128'].setEngine('tensorflow')       | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n",
            "component_list['bert_sentence_embeddings@sent_small_bert_L12_128'].setIsLong(False)              | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n",
            "component_list['bert_sentence_embeddings@sent_small_bert_L12_128'].setStorageRef('sent_small_bert_L12_128')  | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_128\n",
            ">>> component_list['sentiment_dl@sent_small_bert_L12_128'] has settable params:\n",
            "component_list['sentiment_dl@sent_small_bert_L12_128'].setThreshold(0.6)                         | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n",
            "component_list['sentiment_dl@sent_small_bert_L12_128'].setThresholdLabel('neutral')              | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n",
            "component_list['sentiment_dl@sent_small_bert_L12_128'].setEngine('tensorflow')                   | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n",
            "component_list['sentiment_dl@sent_small_bert_L12_128'].setClasses(['positive', 'negative'])      | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n",
            "component_list['sentiment_dl@sent_small_bert_L12_128'].setStorageRef('sent_small_bert_L12_128')  | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_128\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "CtOuwWgAvqXw"
      },
      "source": [],
      "execution_count": null,
      "outputs": []
    }
  ]
}